Learning unification-based natural language grammars

نویسنده

  • Miles Osborne
چکیده

Practical text processing systems need wide covering grammars. When parsing unrestricted language, such grammars often fail to generate all of the sentences that humans would judge to be grammatical. This problem undermines successful parsing of the text and is known as undergeneration. There are two main ways of dealing with undergeneration: either by sentence correction, or by grammar correction. This thesis concentrates upon automatic grammar correction (or machine learning of grammar) as a solution to the problem of undergeneration. Broadly speaking, grammar correction approaches can be classified as being either datadriven, or model-based. Data-driven learners use data-intensive methods to acquire grammar. They typically use grammar formalisms unsuited to the needs of practical text processing and cannot guarantee that the resulting grammar is adequate for subsequent semantic interpretation. That is, data-driven learners acquire grammars that generate strings that humans would judge to be grammatically ill-formed (they overgenerate) and fail to assign linguistically plausible parses. Model-based learners are knowledge-intensive and are reliant for success upon the completeness of a model of grammaticality. But in practice, the model will be incomplete. Given that in this thesis we deal with undergeneration by learning, we hypothesise that the combined use of data-driven and model-based learning would allow data-driven learning to compensate for model-based learning’s incompleteness, whilst model-based learning would compensate for data-driven learning’s unsoundness. We describe a system that we have used to test the hypothesis empirically. The system combines datadriven and model-based learning to acquire unification-based grammars that are more suitable for practical text parsing. Using the Spoken English Corpus as data, and by quantitatively measuring undergeneration, overgeneration and parse plausibility, we show that this hypothesis is correct.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning and Parsing Stochastic Unification-Based Grammars

Stochastic Unification-Based Grammars combine knowledgerich and data-rich approaches to natural language processing. This provides a rich structure to the learning and parsing (decoding) tasks that can be described with undirected graphical models. While most work to date has treated parsing as a straight-forward multi-class classification problem, we are beginning to see how this structure can...

متن کامل

Computational properties of Unification Grammars

There is currently considerable interest among computational linguists in grammatical formalisms with highly restricted generative power. This is based on the argument that a grammar formalism should not merely be viewed as a notation, but as part of the linguistic theory. It is now generally accepted that CFGs lack the generative power needed for this purpose. Unification grammars have the abi...

متن کامل

FCGlight: Making the bridge between Fluid Construction Grammars and main-stream unification grammars by using feature constraint logics

Fluid Construction Grammars (FCGs) are a flavour of Construction Grammars, which themselves are unification-based grammars. Its syntax is (only) up to certain extent similar to other unification-based grammars. However, it lacks a declarative semantics, while its procedural semantics is truly particular, compared to other unification-based grammar formalisms. Here we propose the re-definition o...

متن کامل

Highly Constrained Unification Grammars

Unification grammars are widely accepted as an expressive means for describing the structure of natural languages. In general, the recognition problem is undecidable for unification grammars. Even with restricted variants of the formalism, off-line parsable grammars, the problem is computationally hard. We present two natural constraints on unification grammars which limit their expressivity an...

متن کامل

Rough Sets and Learning by Unification

We apply basic notions of the theory of rough sets, due to Pawlak [30, 31], to explicate certain properties of unification-based learning algorithms for categorial grammars, introduced in [6, 11] and further developed in e.g. [18, 19, 24, 25, 26, 9, 28, 14, 15, 3]. The outcomes of these algorithms can be used to compute both the lower and the upper approximation of the searched language with re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9502002  شماره 

صفحات  -

تاریخ انتشار 1994